List of AI News about ARC AGI 2
| Time | Details |
|---|---|
|
2026-03-02 23:53 |
ARC-AGI-2 Results: Chinese Open-Weight Models Underperform Frontier LLMs — Data-Backed Analysis
According to ARC Prize on X, semi-private ARC-AGI-2 results show Kimi K2.5 scored 12% at $0.28, Minimax M2.5 5% at $0.17, GLM-5 5% at $0.27, and DeepSeek V3.2 4% at $0.12, all below July 2025 frontier lab models (as referenced by ARC Prize) (source: ARC Prize; post amplified by Ethan Mollick). According to ARC Prize, these outcomes indicate current Chinese open-weight models are strong in narrow tasks but weaker on generalization and out-of-distribution reasoning versus leading closed models, highlighting a performance gap with direct business impact on reliability-critical use cases like autonomous agents and complex tool-use pipelines. As reported by ARC Prize, the cost-performance figures suggest competitive token pricing but insufficient reasoning yield, guiding enterprises to consider hybrid stacks—using frontier closed models for hardest reasoning while deploying open-weight models for domain-specific, cost-sensitive workflows. |
|
2026-02-19 16:21 |
Gemini 3.1 Pro Launch: Latest Benchmark Breakthrough with 77.1% ARC‑AGI‑2 Score — 2026 Analysis
According to Demis Hassabis on X, Google DeepMind launched Gemini 3.1 Pro with major gains in core reasoning and problem solving, scoring 77.1% on the ARC-AGI-2 benchmark, more than double Gemini 3 Pro’s performance; the model is rolling out in Gemini App and Antigravity today (source: @demishassabis). As reported by Hassabis, these improvements signal stronger generalization and few-shot capabilities, which can translate into higher accuracy for enterprise agents, code assistants, and automated analytics workflows. According to the announcement, immediate availability in product surfaces enables faster A/B testing, developer adoption, and monetization for partners integrating Gemini 3.1 Pro via app ecosystems. |
